You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version
Documents to Data (Text Processing)
Synopsis
Generates a data set from documents.Description
This operator generates a data set from a collection of documents. For each document in the collection, an example is added to the data set. The text contained in the document is stored in a nominal attribute. If a label or meta data are present associated with the documents, a label attribute or attribute for the meta data are created, respectively.
Input
- documents (Collection)
The documents port.
Output
- example set (Data Table)
The example set port.
Parameters
- text_attributeThe name of the text attribute. Range:
- label_attributeThe name of the label attribute. Range:
- add_meta_informationIf checked, available meta information of the text like filename, date is added as attribute. Range:
- datamanagementDetermines, how the data is represented internally. Range: